Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback from a notification handler can deadlock #78

Closed

Conversation

radeksimko
Copy link
Contributor

I ran into a bug which appears to be very similar to #27

As demonstrated by the attached (failing) test:
If there's any client -> server notification in-flight while a server notification handler is waiting for client to respond to its request, the Callback() method effectively hangs (even though the client has clearly received the request) and no further requests are processed, effectively causing a deadlock.


One particular (IMO valid) use case where this came up is when I tried to implement LSP server-initiated progress in terraform-ls. That means calling window/workDoneProgress/create from either initialized or textDocument/didOpen notification handlers.

@creachadair
Copy link
Owner

Ah, interesting. Yes, I see the trouble: The replies from the callback are handled during the ordinary request processing sequence. I think I know how to fix this, but I'll have to test out a couple of things. Thanks for reporting the issue, and for the helpful reproduction case.

@creachadair creachadair added the bug Something isn't working label Feb 7, 2022
@creachadair creachadair self-assigned this Feb 7, 2022
@creachadair
Copy link
Owner

creachadair commented Feb 7, 2022

The root problem is that the barrier we added in #21, to address the concern raised in #20, prevents any further request batches from being processed.

Roughly, the sequence of events is this:

  1. The first notification arrives, and is dispatched.
  2. The second notification arrives, and blocks on the barrier (correctly).
  3. The first notification handler issues the callback. (2 & 3 could be swapped wlog)
  4. The callback is sent out and processed by the client.
  5. The client's response is sent back and enqueued. ("Enqueued" here is the problem)

At this point, however, the server will not remove any further items from the queue, by design: We can't issue any further handlers until the barrier completes, to preserve order.

In this case, however, we still want to process messages that do not implicate handlers, i.e,. the response from a callback. I believe we can do this by front-loading the handling of responses. That should be safe to do disregarding the barrier.

creachadair added a commit that referenced this pull request Feb 7, 2022
Previously, a notification handler that issues a call back to the client could
block delivery of the reply for its own callback: The barrier we use to
preserve issue order means another batch cannot be issued to the dispatcher
until all previously-issued notifications have completed.

To prevent the handler from deadlocking itself in this case, filter out
response messages from the client when the input is received, rather than
enqueuing them with the handlers. This basically just moves the existing logic
earlier in the transaction, but it means replies can be delivered even if the
barrier is active.

Fixes #78.
@creachadair
Copy link
Owner

Can you please try patching #79 to see if it addresses your issue? It fixes the repro in the test case, but I assume that is a palimpsest of a more interesting usage.

creachadair added a commit that referenced this pull request Feb 8, 2022
Previously, a notification handler that issues a call back to the client could
block delivery of the reply for its own callback: The barrier we use to
preserve issue order means another batch cannot be issued to the dispatcher
until all previously-issued notifications have completed.

To prevent the handler from deadlocking itself in this case, filter out
response messages from the client when the input is received, rather than
enqueuing them with the handlers. This basically just moves the existing logic
earlier in the transaction, but it means replies can be delivered even if the
barrier is active.

Fixes #78.

* Add regression test for deadlock bug.

Co-authored-by: Radek Simko <radek.simko@gmail.com>
@radeksimko radeksimko deleted the deadlock-notify2callback branch February 9, 2022 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants